AITopics | angular error

Collaborating Authors

angular error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LuxDiT: Lighting Estimation with Video Diffusion Transformer

Neural Information Processing SystemsJun-14-2026, 12:12:23 GMT

Estimating scene lighting from a single image or video remains a longstand-ing challenge in computer vision and graphics. Learning-based approaches areconstrained by the scarcity of ground-truth HDR environment maps, which areexpensive to capture and limited in diversity. While recent generative modelsoffer strong priors for image synthesis, lighting estimation remains difficult dueto its reliance on indirect visual cues, the need to infer global (non-local) con-text, and the recovery of high-dynamic-range outputs. We propose LuxDiT, anovel data-driven approach that fine-tunes a video diffusion transformer to gen-erate HDR environment maps conditioned on visual input. Trained on a largesynthetic dataset with diverse lighting conditions, our model learns to infer il-lumination from indirect visual cues and generalizes effectively to real-worldscenes. To improve semantic alignment between the input and the predicted environment map, we introduce a low-rank adaptation finetuning strategy using a collected dataset of HDR panoramas.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Adaptive Plane Reformatting for 4D Flow MRI using Deep Reinforcement Learning

Bisbal, Javier, Sotelo, Julio, Valdés, Maria I, Irarrazaval, Pablo, Andia, Marcelo E, García, Julio, Rodriguez-Palomarez, José, Raimondi, Francesca, Tejos, Cristián, Uribe, Sergio

arXiv.org Artificial IntelligenceDec-2-2025

Background and Objective: Plane reformatting for four-dimensional phase contrast MRI (4D flow MRI) is time-consuming and prone to inter-observer variability, which limits fast cardiovascular flow assessment. Deep reinforcement learning (DRL) trains agents to iteratively adjust plane position and orientation, enabling accurate plane reformatting without the need for detailed landmarks, making it suitable for images with limited contrast and resolution such as 4D flow MRI. However, current DRL methods assume that test volumes share the same spatial alignment as the training data, limiting generalization across scanners and institutions. To address this limitation, we introduce AdaPR (Adaptive Plane Reformatting), a DRL framework that uses a local coordinate system to navigate volumes with arbitrary positions and orientations. Methods: We implemented AdaPR using the Asynchronous Advantage Actor-Critic (A3C) algorithm and validated it on 88 4D flow MRI datasets acquired from multiple vendors, including patients with congenital heart disease. Results: AdaPR achieved a mean angular error of 6.32 +/- 4.15 degrees and a distance error of 3.40 +/- 2.75 mm, outperforming global-coordinate DRL methods and alternative non-DRL methods. AdaPR maintained consistent accuracy under different volume orientations and positions. Flow measurements from AdaPR planes showed no significant differences compared to two manual observers, with excellent correlation (R^2 = 0.972 and R^2 = 0.968), comparable to inter-observer agreement (R^2 = 0.969). Conclusion: AdaPR provides robust, orientation-independent plane reformatting for 4D flow MRI, achieving flow quantification comparable to expert observers. Its adaptability across datasets and scanners makes it a promising candidate for medical imaging applications beyond 4D flow MRI.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2506.00727

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Clustering Guided Residual Neural Networks for Multi-Tx Localization in Molecular Communications

Sonmez, Ali, Ozbey, Erencem, Mantaroglu, Efe Feyzi, Yilmaz, H. Birkan

arXiv.org Artificial IntelligenceNov-12-2025

Abstract--Transmitter localization in Molecular Communication via Diffusion is a critical topic with many applications. However, accurate localization of multiple transmitters is a challenging problem due to the stochastic nature of diffusion and overlapping molecule distributions at the receiver surface. T o address these issues, we introduce clustering-based centroid correction methods that enhance robustness against density variations, and outliers. In addition, we propose two clustering-guided Residual Neural Networks, namely AngleNN for direction refinement and SizeNN for cluster size estimation. Experimental results show that both approaches provide significant improvements with reducing localization error between 69% (2-Tx) and 43% (4-Tx) compared to the K-means.

artificial intelligence, localization, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.08513

Country: Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SLYKLatent: A Learning Framework for Gaze Estimation Using Deep Facial Feature Learning

Adebayo, Samuel, Dessing, Joost C., McLoone, Seán

arXiv.org Artificial IntelligenceNov-6-2025

In this research, we present SLYKLatent, a novel approach for enhancing gaze estimation by addressing appearance instability challenges in datasets due to aleatoric uncertainties, covariant shifts, and test domain generalization. SLYKLatent utilizes Self-Supervised Learning for initial training with facial expression datasets, followed by refinement with a patch-based tri-branch network and an inverse explained variance-weighted training loss function. Our evaluation on benchmark datasets achieves a 10.9% improvement on Gaze360, supersedes top MPIIFaceGaze results with 3.8%, and leads on a subset of ETH-XGaze by 11.6%, surpassing existing methods by significant margins. Adaptability tests on RAF-DB and Affectnet show 86.4% and 60.9% accuracies, respectively. Ablation studies confirm the effectiveness of SLYKLatent's novel components.

artificial intelligence, estimation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/THMS.2025.3553404

2402.01555

Country:

Europe > United Kingdom (0.46)
Africa > Nigeria (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

Gaze Estimation for Human-Robot Interaction: Analysis Using the NICO Platform

Palider, Matej, Eldardeer, Omar, Kocur, Viktor

arXiv.org Artificial IntelligenceOct-9-2025

This is mainly because of the importance of the gaze as a social non-verbal cue for interaction [3]. It drives many different social cognitive mechanisms (such as joint attention, intention prediction, and task coordination) and provides an explainable behaviour for others [4,5]. Affective states are also represented in the gaze behaviour [6]. The ability to perceive and understand the social cues affects the effectiveness and efficiency of the whole interaction experience. Gaze understanding and following is one of the earliest behavior mechanisms developed by infants to engage in different social communication scenarios [7]. Therefore, achieving high accuracy in gaze estimation is a key enabler to reach a seamless Human-Robot interaction task. Despite the significant progress of gaze estimation methodologies, these methods remain not fully evaluated in real human-robot interaction scenarios. In this paper we present an applied evaluation for the latest gaze estimation methods in a standard HRI scenario, specifically when the human and the robot are engaged in a shared task space (e.g., table surface).

artificial intelligence, estimation, estimation method, (14 more...)

arXiv.org Artificial Intelligence

2509.24001

Country: Europe (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.92)

Add feedback

LuxDiT: Lighting Estimation with Video Diffusion Transformer

Liang, Ruofan, He, Kai, Gojcic, Zan, Gilitschenski, Igor, Fidler, Sanja, Vijaykumar, Nandita, Wang, Zian

arXiv.org Artificial IntelligenceSep-5-2025

Estimating scene lighting from a single image or video remains a longstanding challenge in computer vision and graphics. Learning-based approaches are constrained by the scarcity of ground-truth HDR environment maps, which are expensive to capture and limited in diversity. While recent generative models offer strong priors for image synthesis, lighting estimation remains difficult due to its reliance on indirect visual cues, the need to infer global (non-local) context, and the recovery of high-dynamic-range outputs. We propose LuxDiT, a novel data-driven approach that fine-tunes a video diffusion transformer to generate HDR environment maps conditioned on visual input. Trained on a large synthetic dataset with diverse lighting conditions, our model learns to infer illumination from indirect visual cues and generalizes effectively to real-world scenes. To improve semantic alignment between the input and the predicted environment map, we introduce a low-rank adaptation finetuning strategy using a collected dataset of HDR panoramas. Our method produces accurate lighting predictions with realistic angular high-frequency details, outperforming existing state-of-the-art techniques in both quantitative and qualitative evaluations.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.0368

Country: North America (0.28)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Probabilistic Task Parameterization of Tool-Tissue Interaction via Sparse Landmarks Tracking in Robotic Surgery

Wang, Yiting, Fan, Yunxin, Liu, Fei

arXiv.org Artificial IntelligenceApr-17-2025

Accurate modeling of tool-tissue interactions in robotic surgery requires precise tracking of deformable tissues and integration of surgical domain knowledge. Traditional methods rely on labor-intensive annotations or rigid assumptions, limiting flexibility. We propose a framework combining sparse keypoint tracking and probabilistic modeling that propagates expert-annotated landmarks across endoscopic frames, even with large tissue deformations. Clustered tissue keypoints enable dynamic local transformation construction via PCA, and tool poses, tracked similarly, are expressed relative to these frames. Embedding these into a Task-Parameterized Gaussian Mixture Model (TP-GMM) integrates data-driven observations with labeled clinical expertise, effectively predicting relative tool-tissue poses and enhancing visual understanding of robotic surgical motions directly from video data.

artificial intelligence, machine learning, tool-tissue interaction, (10 more...)

arXiv.org Artificial Intelligence

2504.11495

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.15)
North America > United States > California > Santa Clara County (0.15)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)

Add feedback

AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion

Xie, Liuyue, Guo, Jiancong, Cakmakci, Ozan, Araujo, Andre, Jeni, Laszlo A., Jia, Zhiheng

arXiv.org Artificial IntelligenceMar-27-2025

Accurate camera calibration is a fundamental task for 3D perception, especially when dealing with real-world, in-the-wild environments where complex optical distortions are common. Existing methods often rely on pre-rectified images or calibration patterns, which limits their applicability and flexibility. In this work, we introduce a novel framework that addresses these challenges by jointly modeling camera intrinsic and extrinsic parameters using a generic ray camera model. Unlike previous approaches, AlignDiff shifts focus from semantic to geometric features, enabling more accurate modeling of local distortions. We propose AlignDiff, a diffusion model conditioned on geometric priors, enabling the simultaneous estimation of camera distortions and scene geometry. To enhance distortion prediction, we incorporate edge-aware attention, focusing the model on geometric features around image edges, rather than semantic content. Furthermore, to enhance generalizability to real-world captures, we incorporate a large database of ray-traced lenses containing over three thousand samples. This database characterizes the distortion inherent in a diverse variety of lens forms. Our experiments demonstrate that the proposed method significantly reduces the angular error of estimated ray bundles by ~8.2 degrees and overall calibration accuracy, outperforming existing approaches on challenging, real-world datasets.

aberration, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.21581

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Media > Photography (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting

Yang, Linqi, Zhao, Xiongwei, Sun, Qihao, Wang, Ke, Chen, Ao, Kang, Peng

arXiv.org Artificial IntelligenceMar-7-2025

6-DoF pose estimation is a fundamental task in computer vision with wide-ranging applications in augmented reality and robotics. Existing single RGB-based methods often compromise accuracy due to their reliance on initial pose estimates and susceptibility to rotational ambiguity, while approaches requiring depth sensors or multi-view setups incur significant deployment costs. To address these limitations, we introduce SplatPose, a novel framework that synergizes 3D Gaussian Splatting (3DGS) with a dual-branch neural architecture to achieve high-precision pose estimation using only a single RGB image. Central to our approach is the Dual-Attention Ray Scoring Network (DARS-Net), which innovatively decouples positional and angular alignment through geometry-domain attention mechanisms, explicitly modeling directional dependencies to mitigate rotational ambiguity. Additionally, a coarse-to-fine optimization pipeline progressively refines pose estimates by aligning dense 2D features between query images and 3DGS-synthesized views, effectively correcting feature misalignment and depth errors from sparse ray sampling. Experiments on three benchmark datasets demonstrate that SplatPose achieves state-of-the-art 6-DoF pose estimation accuracy in single RGB settings, rivaling approaches that depend on depth or multi-view images.

artificial intelligence, machine learning, pose estimation, (16 more...)

arXiv.org Artificial Intelligence

2503.05174

Country:

Asia > China > Heilongjiang Province > Harbin (0.05)
Asia > China > Henan Province > Zhengzhou (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ViIK: Flow-based Vision Inverse Kinematics Solver with Fusing Collision Checking

Meng, Qinglong, Xia, Chongkun, Wang, Xueqian

arXiv.org Artificial IntelligenceAug-28-2024

Inverse Kinematics (IK) is to find the robot's configurations that satisfy the target pose of the end effector. In motion planning, diverse configurations were required in case a feasible trajectory was not found. Meanwhile, collision checking (CC), e.g. Oriented bounding box (OBB), Discrete Oriented Polytope (DOP), and Quickhull \cite{quickhull}, needs to be done for each configuration provided by the IK solver to ensure every goal configuration for motion planning is available. This means the classical IK solver and CC algorithm should be executed repeatedly for every configuration. Thus, the preparation time is long when the required number of goal configurations is large, e.g. motion planning in cluster environments. Moreover, structured maps, which might be difficult to obtain, were required by classical collision-checking algorithms. To sidestep such two issues, we propose a flow-based vision method that can output diverse available configurations by fusing inverse kinematics and collision checking, named Vision Inverse Kinematics solver (ViIK). Moreover, ViIK uses RGB images as the perception of environments. ViIK can output 1000 configurations within 40 ms, and the accuracy is about 3 millimeters and 1.5 degrees. The higher accuracy can be obtained by being refined by the classical IK solver within a few iterations. The self-collision rates can be lower than 2%. The collision-with-env rates can be lower than 10% in most scenes. The code is available at: https://github.com/AdamQLMeng/ViIK.

configuration, ik solver, viik, (15 more...)

arXiv.org Artificial Intelligence

2408.11293

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback